DIGITAL HEALTH
○ SAGE Publications
Preprints posted in the last 30 days, ranked by how well they match DIGITAL HEALTH's content profile, based on 11 papers previously published here. The average preprint has a 0.07% match score for this journal, so anything above that is already an above-average fit.
Dai, Y.; Lu, Y.; Li, Y.; Li, M.; Jia, Y.; Zhou, Z.; Li, C.
Show abstract
BackgroundIndividuals with severe mobility impairments (SMI) often experience significant psychological distress and chronic pain. Virtual walking (VW) presents an innovative rehabilitation approach to improve mood and alleviate pain. This study aimed to develop a home-based VW system with integrated mood and symptom tracking and to report on its feasibility and usability in a user study with individuals with SMI. MethodsA multidisciplinary, iterative frame-work guided the systems development. Following initial contextual research and design iterations, a user study was conducted with 11 participants with SMI. A repeated measures pre-post design was employed. Feasibility and usability were primarily assessed through post-study qualitative interviews, analyzed via content analysis. Changes in mood and symptoms were measured immediately before and after each session. Momentary mood was captured using an in-virtual reality (in-VR) two-dimensional (2D) affect grid, while embedded single-item state ratings were used to track anxiety, depressed mood, and pain. Daily mood changes and symptom trajectories were analyzed using logistic regression and generalized estimating equations (GEE), respectively. ResultsContextual research guided the system design towards enhancing accessibility, ergonomics, and therapeutic engagement. The final VW system featured three core modules: locomotion, multi-sensory feedback, and mood/symptom tracking. Qualitative analysis of the user study revealed high acceptance for the VW system, alongside challenges related to content variety and hardware ergonomics. Each intervention session was significantly associated with an immediate positive mood shift (odds ratio (OR) = 1.83), as measured by the affect grid. Furthermore, GEE models revealed a significant reduction in self-reported depression and anxiety symptoms over the intervention period (all P < 0.01). ConclusionsThis study confirms the feasibility and acceptability of the novel VW system for home-based use by individuals with SMI. The preliminary evidence suggests the system has high potential as a tool for improving mood and alleviating psychological distress. Future large-scale randomized controlled trials are warranted to establish its clinical efficacy. Trial registration numberNCT07073144-07/17/2025.
Alrefaei, D.; Huang, K.; Sukumar, A.; Djamasbi, S.; Tulu, B.; Davis Martin, R.
Show abstract
Eye tracking is recognized as a gold standard for measuring visual attention and cognitive engagement. In this study, it offers a useful lens for understanding how primary care providers balance patient communication with navigation of electronic health records (EHRs). We used wearable eye tracking to collect visual information processing behavior and conducted a retrospective think-aloud protocol to examine how primary care clinicians processed suiciderelated information (CAT-MH(R)) embedded in the EHR during simulated visits. Eye-movement data showed substantial visual attention directed toward the EHR, indicating added information-processing demands during communication. Retrospective think-aloud data supported the analysis of eye movement data by revealing that clinicians searched multiple record sections to verify risk indicators and often postponed suicide-related discussions until confirming relevant results. These findings illustrate how EHR-embedded screening tools shape clinical attention and encounter flow.
Singh, P.; Gonuguntla, S.; Chen, E.; Pradhan, A.; Becker, I.; Xu, N.; Steel, B.; Arkam, F.; Yakdan, S.; Benedict, B.; Naveed, H.; Wang, W.; Guo, W.; Wilt, Z.; Badhiwala, J.; Hafez, D.; Ogunlade, J.; Ray, W. Z.; Ghogawala, Z.; Kelleher, C.; Greenberg, J. K.
Show abstract
Objective: Evaluating and monitoring patients with cervical spondylotic myelopathy (CSM) remains a challenge due to limited tools for assessing objective neurological disability longitudinally and in the home environment. Given their prevalence and low cost, mobile health (mHealth), and specifically smartphone technologies offer a promising approach to fill this gap. This study explored stakeholder perspectives on the role of mHealth in CSM monitoring to inform development of a smartphone-based assessment application. Methods: We conducted semi-structured interviews with 15 patients with CSM and 14 healthcare providers (spine surgeons, physical therapists, and occupational therapists). Interviews explored current assessment practices, perceived limitations, and attitudes toward mHealth integration. Data were analyzed using thematic analysis. Results: Two major themes emerged from provider interviews: (1) diagnosing and monitoring CSM is challenging due to limitations in current tools, and (2) mHealth presents significant opportunities but requires thoughtful integration. Providers described current methods and technologies, clinical signs and symptoms, and challenges evaluating patients. Current tools were viewed as inadequate for precision medicine, with inter-rater variability and inability to capture real-world function. Within the second theme, providers identified ways mHealth could improve care, challenges for integration, and practical implementation considerations. Patients expressed strong interest in objective, longitudinal monitoring of gait, dexterity, and daily function. Conclusions: Stakeholders recognized substantial potential for mHealth to address unmet needs in CSM assessment. Successful implementation will require intuitive design, electronic medical record integration, and attention to accessibility. These findings provide a foundation for user-centered development of digital health tools in CSM care.
Bauer, M. P.; van Tol, E. M.; Constansia, T. K. M.; King, L.; van Buchem, M. M.
Show abstract
BackgroundTyping in the electronic health record (EHR) takes up healthcare providers time and cognitive space and constitutes a substantial administrative burden contributing to high burnout rates in healthcare. Ambient digital scribes may improve this problem. ObjectiveTo investigate the effect of the use of Autoscriber, an ambient digital scribe, on healthcare providers administrative workload and the quality of medical notes in the EHR. MethodsA study period of 26 weeks was randomized into weeks when healthcare providers were allowed to use Autoscriber (intervention weeks) and weeks when they were not (control weeks) in a 2:1 ratio. Workload was assessed by comparing the number of characters typed in the medical note during control weeks with the number of modifications that were made to the summary produced by Autoscriber during intervention weeks. Quality of the medical note was measured by having a large language model (LLM) count the number of hallucinations, incorrect negations, context conflation errors, speculations, other inaccuracies, omissions, succinctness errors, organization errors and terminology errors per medical note. ResultsBetween 1 November 2024 and 30 April 2025, 35 healthcare providers from 14 different specialties recorded 387 consultations in intervention weeks, and 142 in control weeks. The median number of characters typed per medical note was 1079 in control weeks and the median number of modifications necessary to produce the medical note was 351 in intervention weeks, compatible with a lower workload. All types of errors occurred significantly less frequently in notes made with the support of Autoscriber than in those without, except for speculations, where the difference did not reach statistical significance. ConclusionsThe use of Autoscriber resulted in a lower workload and a higher quality of the medical note.
Al-Dabbas, Z.; Khandakji, L.; Al-Shatarat, N.; Alqaisiah, H.; Ibrahim, Y.; Awed, T.; Baik, H.; Dawoud, M.; Ali, R. A.-H.; Telfah, Z.; Al-Hmaid, Y.; Alsharkawi, A.
Show abstract
Artificial intelligence (AI) is increasingly integrated into healthcare delivery, yet patient acceptance in resource constrained settings remains incompletely characterized. This study assessed attitudes toward AI supported care among patients attending hospitals in three Jordanian governorates (Amman, Balqa, Irbid) and examined demographic and digital literacy correlates of acceptance. In a cross sectional survey (n = 500 complete questionnaires), participants rated exposure to AI in healthcare and five attitudinal domains, namely perceived usefulness or performance expectancy, trust and transparency, privacy and perceived risks, empathy and human interaction, and readiness or behavioral intention, using 25 items on 5 point Likert scales. Patients expressed conditional optimism: empathy and human interaction was most strongly endorsed (M = 4.33, SD = 0.58), alongside relatively high perceived usefulness (M = 3.97, SD = 0.68), while trust and transparency (M = 3.57, SD = 0.74) and readiness (M = 3.66, SD = 0.90) were moderate to high; privacy and risk concerns were moderate (M = 3.51, SD = 0.77) and self reported exposure was lowest (M = 2.57, SD = 1.07). The highest agreement item indicated preference for AI to work alongside physicians rather than be relied on alone (M = 4.47, SD = 0.81). Trust and transparency and perceived usefulness were positively associated with readiness (r = 0.48 and r = 0.44, respectively; p <.001), while privacy and perceived risks were negatively correlated with trust and usefulness. In multivariable regression adjusting for gender, age group, education, prior AI health app or device use, and self rated digital skill, lower educational attainment (less than high school and high school) predicted reduced readiness, whereas higher digital skill predicted increased readiness (R2 = 0.101). These findings suggest that implementation strategies in Jordan should emphasize human involvement alongside AI, transparent communication and governance, and interventions that build digital confidence and reduce readiness gaps linked to education. Author summaryAI is increasingly used in healthcare, for example to support diagnosis, triage, and treatment decisions. Whether these tools are accepted by patients depends not only on how well they work, but also on whether patients trust them, understand how they are used, and feel their privacy is protected. Evidence on patient views in middle income and resource constrained settings is still limited. We surveyed 500 patients attending hospitals in three Jordanian governorates to understand how they view AI supported care. Patients generally expected AI to be useful, but they strongly preferred that clinicians remain actively involved and that AI supports rather than replaces physicians. Trust and perceived usefulness were closely linked to willingness to accept AI enabled care, while privacy concerns were present and shaped trust. Readiness to accept AI was lower among participants with lower educational attainment and higher among those with greater self rated digital skill. These findings suggest that successful implementation in Jordan should prioritize transparent communication, strong privacy safeguards, and human centered workflows, while also strengthening digital confidence to avoid widening gaps in acceptance.
Chin, A. T.; Zhu, N.; Kingsley, T. C.; Mynampati, P.; Phipps, Y.; Romanov, A.; Vangala, S.; Weng, M.; Wisk, L. E.; Woo, H.; Mafi, J. N.; Lukac, P. J.
Show abstract
BackgroundEHR documentation and chart review contribute to clinician workload and burnout. To alleviate pre-charting burden, Epic has released a new generative AI chart summarizer tool, which has become widely adopted; however, its impact has not been examined in randomized trials. ObjectiveTo evaluate whether access to an Epic generative AI chart summarization tool reduces cognitive task load among ambulatory providers compared with usual care. MethodsTwo-arm, parallel-group randomized controlled trial among ambulatory clinicians across multiple specialties. Clinicians will be randomized 1:1 to tool access versus usual care for 90 days. The primary outcome is change in a 4-item physician task load (PTL) adapted for the pre-charting task. Exploratory outcomes include EHR-derived time metrics (Caboodle and Signal), professional fulfillment/burnout (PFI), usability (SUS), clinician satisfaction, aggregated patient experience item from CG-CAHPS, and reported safety related metrics. Ethics and DisseminationAnalyses will use clinician-level survey responses and aggregated EHR metrics; no patient-level protected health information will be included in the analytic dataset. Results will be disseminated via preprint and peer-reviewed publication. Article summary - Strengths and limitations of this studyO_LIThis study is a 3-month pragmatic randomized controlled trial evaluating a native EHR-embedded generative AI tool that summarizes prior clinical notes for ambulatory encounters. C_LIO_LIThe primary outcome uses a validated cognitive task load instrument adapted specifically for pre-charting activities. C_LIO_LIExploratory outcomes include objective EHR-derived time metrics, validated psychometric measures of burnout and professional fulfillment, and clinician-reported survey measures assessing perceived usefulness of the tool. C_LIO_LIThe trial is single-centered, which may limit generalizabilty, and the intervention is optional-use and unblinded, which may attenuate observed effects and introduce performance bias. C_LI
Bladder, K. J. M.; Verburg, A. C.; Arts-Tenhagen, M.; Willemsen, R.; van den Broek, G. B.; Driessen, C. M. L.; Driessen, R. J. B.; Robberts, B.; Scheffer, A. R. T.; de Vries, A. P.; Frenzel, T.; Swillens, J. E. M.
Show abstract
BackgroundGenerative artificial intelligence (GenAI) in healthcare may reduce administrative burden and enhance quality of care. Large language models (LLMs) can generate draft responses to patient messages using electronic health record (EHR) data. This could mitigate increased workload related to high message volumes. While effectiveness and feasibility of these GenAI tools have been studied in the United States, evidence from non-English contexts is scarce, particularly regarding user experience. ObjectiveThis study evaluated the effectiveness, feasibility and barriers and facilitators of implementing Epics Augmented Response Technology (Art) GenAI tool (Epic Systems Corporation, Verona, WI, USA) in a Dutch academic healthcare setting among a broad range of end users. It explored healthcare professionals (HCP) usage metrics, expectations, and early user experiences. MethodsWe conducted a hybrid type 1 effectiveness-implementation design. HCPs of four clinical departments (dermatology, medical oncology, otorhinolaryngology, and pulmonology) participated in a six-month study. Effectiveness of Art was assessed using efficiency indicators from Epic (including all InBasket users in the hospital) and survey scales measuring well-being and clinical efficiency at three time points: PRE, POST-1 (1 month), and POST-2 (4 months). Feasibility of Art was evaluated through adoption indicators from Epic and survey scales on use and usability. Barriers and facilitators of Art implementation were collected through the survey and thematized using the NASSS framework (Nonadoption, Abandonment, Scale-up, Spread and Sustainability). Results237 unique HCPs generated a total of 8,410 drafts. Review and drafting times were similar for users with and without Art, indicating minimal differences. Perceived clinical efficiency declined significantly from PRE to POST-2, while well-being remained unchanged. Adoption was initially high but decreased over time, averaging 16.7% across departments. Usability and intention-to-use scores also declined significantly. Oualitative findings highlighted time savings, well-structured drafts, and patient-centered language as facilitators. Reported barriers included limited impact on time, low practical utility, content inaccuracies, and style misalignment. ConclusionsThis evaluation of a GenAI tool for patient-provider communication in a non-English academic hospital revealed mixed perceptions of effectiveness and feasibility. High initial expectations contrasted with limited perceived impact on time-savings, well-being and clinical efficiency, alongside declining adoption and usability. Barriers and facilitators revealed contrasting views. These findings underscore the need for a workflow for the handling of user feedback, guidance on clinical responsibilities, along with clear communication about the tools purpose and limitations to manage expectations. Additionally, establishing consensus on a set of quality indicators and their thresholds that indicate when a GenAI tool is sufficiently robust will be critical for responsible scaling of GenAI in clinical practice.
Gai, S.; Li, D.; Borchert, G.; Huang, F.; Leng, X.; Huang, J.
Show abstract
BackgroundShort-video platforms have become increasingly important sources of health information for the general public. However, the informational quality and dissemination patterns of content related to specific therapeutic modalities, such as enhanced external counterpulsation (EECP), remain insufficiently characterized. This study aimed to evaluate the informational quality of EECP-related videos on a short-video platform and to examine the relationship between content quality and user engagement. MethodsA cross-sectional content analysis was conducted on EECP-related short videos identified through keyword-based searches. Informational quality was independently assessed using four validated instruments: the Global Quality Scale (GQS), the Journal of the American Medical Association (JAMA) benchmark criteria, the modified DISCERN instrument (mDISCERN), and the Video Information and Quality Index (VIQI). Video characteristics and user engagement metrics were extracted and analyzed. ResultsOverall, EECP-related videos demonstrated low-to-moderate informational quality across all assessment tools. Longer video duration was consistently associated with higher informational quality scores. In contrast, user engagement metrics, including the number of likes and comments, showed weak or negative associations with informational quality. Compared with videos addressing other coronary heart disease treatments, EECP-related videos were less frequently represented and received lower overall engagement. ConclusionsEECP-related content on short-video platforms is characterized by limited visibility and modest informational quality, with a notable misalignment between user engagement and informational value. These findings suggest that clinically relevant but complex therapies such as EECP may be structurally disadvantaged in short-video health communication environments.
Yousaf, M. N.; Anwar, M. N.; Naveed, N.; Haider, U.
Show abstract
BackgroundTinnitus affects a substantial proportion of the global population and can severely disrupt sleep, mood, and daily functioning, yet the quality of mobile health apps designed for tinnitus management remains highly variable. Traditional evaluation methods, including clinical trials, expert rating scales, and small-scale surveys, rarely capture large-scale, feature-level feedback from real-world users, leaving a gap in understanding which app characteristics drive sustained engagement and satisfaction. MethodsThis study analysed 342,520 English-language reviews from 84 tinnitus-related apps on iOS and Android collected between 2015 and 2025. A pipeline first applied VADER-based preprocessing and sentiment assignment, then trained a graph neural network aspect-based sentiment analysis (GNN-ABSA) model operating on sentence-level dependency graphs to infer feature-level sentiment for domains such as sound therapy, sleep support, pricing, advertisements, stability, and user interface. ResultsThe GNN-ABSA model achieved an accuracy of 84.4% and a macro F1 score of 0.829 on unseen aspect-level test data, indicating stable performance across sentiment classes. Therapeutic features like sound masking and sleep support were associated with predominantly positive sentiment, whereas pricing, advertisements, background playback, and technical stability attracted more neutral or negative feedback over the ten-year period. ConclusionsLarge-scale, graph-based feature-level sentiment analysis provides a user-cantered perspective that complements clinical trials and expert app quality ratings, offering actionable guidance for developers seeking to prioritize design improvements, supporting clinicians in recommending suitable apps to patients, and informing the design of more explainable and user-driven digital health tools. Trial RegistrationNot applicable. This study analysed publicly available app store reviews and did not involve human participants.
Guo, Y.; Hu, D.; Yang, Z.; Chow, E.; Tam, S.; Perret, D.; Pandita, D.; Zheng, K.
Show abstract
Structured AbstractO_ST_ABSObjectiveC_ST_ABSThe use of ambient AI documentation tools is rapidly growing in US hospitals and clinics. Such tools generate the first draft of clinical notes from scribed patient-provider conversations, which clinicians can then review and edit before signing into electronic health records (EHR). Understanding how and why clinicians make modifications to AI-generated drafts is critical to improving AI design and clinical efficiency, yet it has been under-studied. This study aims to address this gap. Materials and MethodsWe conducted semistructured interviews with 30 clinicians from the University of California, Irvine Health who used a commercial ambient AI tool in routine outpatient care. We invited them to describe how and why they edited AI drafts based on both their personal experience and review of some real-world examples identified from our previous studies. ResultsModifications to AI drafts were primarily made to improve clinical accuracy and specialty-specific precision, reduce medico-legal and liability risk, and meet billing, coding, and documentation standards. Such editing was necessary due to reasons such as transcription errors, speaker attribution mistakes, overconfident statements without evidence, missing key clinical details, and AIs lack of information about the patient context. Conclusion and DiscussionImproving ambient AI documentation will require coordinated effort from vendors, institutions, and clinicians. Key targets include core model reliability (e.g., transcription accuracy), specialty-and encounter-level customization, clinician-level personalization, more effective EHR integration, and institutional support (e.g., training, governance, and standardized review guidance), complemented by clinicians adaptive communication strategies that strengthen human-AI collaboration.
Ori, E. M.; Baay, C.; Ester, M.; Toohey, A. M.
Show abstract
The ubiquitous use of digital tools may be beneficial for improving physical activity across diverse populations. It remains unknown however, how publicly available, cost-free physical activity apps adhere to behaviour change techniques, and how users rate these apps. To explore the number of publicly available physical activity apps and relationships among behaviour science techniques, subjective quality, and user ratings. Exploratory content analysis of 17 apps meeting inclusion criteria. The App Behaviour Change Scale (ABACUS) and Mobile App Rating Scale (MARS) were used to code each downloaded app for behaviour change techniques, app functionality, and subjective quality. App store user ratings were also collected along with descriptive data about each app. All apps were commercially affiliated, targeted adult populations, and centered on changing behaviour, setting goals, and addressing physical health. No apps addressed all 21 ABACUS items; apps included 12.8 {+/-} 2.4 indicators, ranging from 8-18 indicators. The three most common ABACUS indicators were: i) collection of baseline information, ii) instructional PA content, and iii) ability for app to give user feedback. The three least common ABACUS indicators were: i) ability to export data, ii) consequences for physical activity dis/continuance, and iii) allows for planning of barriers. No apps included all 12 MARS focus areas; 94.1% of apps allowed goal setting, 58.8% addressed physical health, and 41.2% included a mindfulness focus. Linear regressions explored relationships for app user ratings; aggregated MARS domains accounted for 54% of the variance. Publicly available physical activity apps may be a useful approach to improving physical activity uptake and adherence among harder-to-reach populations including low socioeconomic status groups. App developers should consider incorporating more behaviour change techniques within cost-free apps to improve user uptake and ultimately improve physical activity associated health outcomes. Author SummaryDigital technology proliferates all facets of life and populations, and may contribute to improved health behaviours including physical activity. However, access to supportive technology may be limited by cost for example, as many popular physical activity apps require paid subscriptions. It is unknown whether cost-free physical activity apps adhere to behaviour change recommendations and how these apps are rated by users. This research explored cost-free, publicly available physical activity apps and their respective relationships with behaviour change techniques as well as app-store user ratings. Only 17 apps met inclusion criteria, and were compared against one behaviour change scale and one app quality scale. All apps had commercial motivations and focused on physical activity for adult populations. Most commonly, apps collected user info at baseline, provided physical activity instructional content, and provided feedback to users. Apps were generally rated positively by users based on app-store star ratings. Cost-free physical activity apps may be useful tools for users looking to improve physical activity for individuals who are limited by their socioeconomic situation. However, greater emphasis on evidence-based behaviour change approaches may be necessary to improve health outcomes for users.
Ng, J. Y.; Bhavsar, D.; Krishnamurthy, M.; Dhanvanthry, N.; Fry, D.; Kim, J. W.; King, A.; Lai, J.; Makwanda, A.; Olugbemiro, P.; Patel, J.; Virani, I.; Ying, E.; Yong, K.; Zaidi, A.; Zouhair, J.; Lee, M. S.; Lee, Y.-S.; Nesari, T. M.; Ostermann, T.; Witt, C. M.; Zhong, L.; Cramer, H.
Show abstract
BackgroundArtificial intelligence chatbots (AICs) are increasingly being integrated into scholarly publishing, with the potential to automate routine editorial tasks and streamline workflows. In traditional, complementary, and integrative medicine (TCIM) publishing, editorial and peer review processes can be particularly complex due to diverse methodologies and culturally embedded knowledge systems, presenting unique opportunities and challenges for AIC adoption. MethodsAn anonymous, online cross-sectional survey was distributed to the editorial board members of 115 TCIM journals. The survey assessed familiarity and current use of AICs, perceived benefits and challenges, ethical concerns, and anticipated future roles in editorial workflows. ResultsOf 5119 invitations, 217 eligible participants completed the survey. While approximately 70% of respondents reported familiarity with AI tools, over 60% had never used AICs for editorial tasks. Editors expressed strongest support for text-focused applications, such as grammar and language checks (81.0%) and plagiarism/ethical screening (67.4%). Most respondents (82.8%) believed that AICs would be important or very important to the future of scholarly publishing; however, the majority (65.3%) reported that their journals lacked AI-specific policies and training programs to guide editors and peer reviewers. ConclusionsMost TCIM editors believe that AICs have potential to support routine editorial functions but also have limited adoption into editorial and peer review processes due to practical, ethical, and institutional barriers. Additional training and guidance are warranted by journals to direct responsible and ethical use if AICs are to be adopted in TCIM academic publishing.
Schmiedmayer, P.; Johnson, A.; Schuetz, N.; Kollmer, L.; Goldschmidt, P.; Delgado-SanMartin, J.; Zhang, K.; Mantena, S. D.; Tolas, A.; Montalvo, S.; Raimrez Posada, M.; O'Sullivan, J. W.; Oppezzo, M.; King, A. C.; Rodriguez, F.; Ashley, E.; Lawrie, A.; Kim, D. S.
Show abstract
BackgroundCardiovascular disease remains the leading cause of global morbidity and mortality. The original My Heart Counts smartphone application demonstrated the feasibility of large-scale, fully digital recruitment and trial conduct, but was limited by platform exclusivity and the need for human experts to create text-based behavioral interventions. MethodsThe next-generation My Heart Counts smartphone application is a prospective, observational cohort study with an embedded randomized crossover trial, evaluating personalized text-based coaching prompts, available in both English and Spanish. All study and trial operations will be conducted via the My Heart Counts smartphone application, re-designed using the open-source Stanford Spezi framework to support iOS, with a planned Android release in 2027. The target enrollment is N=15,000 adults across the United States and United Kingdom. The study establishes a comprehensive digital biobank by synthesizing passive mobile health data (steps, flights climbed, heart rate, sleep, workouts), raw sensor data (e.g., accelerometry), longitudinal clinical surveys, active tasks (6-minute walk test and 12-minute Cooper run test), electrocardiograms (ECG), and electronic health record (EHR) data integrated via HL7 FHIR protocols. The embedded trial evaluates the effect of text-based coaching prompts generated by a large language model (LLM) grounded in the Transtheoretical Model of Change on daily physical activity, as compared to generic prompts. Planned AnalysisThe primary endpoint of the randomized crossover trial is change in daily step count between LLM-driven and generic text-based intervention arms, analyzed using mixed-effects models. Secondary endpoints include change in mean active minutes and calorie burn over each intervention week. Other analyses include the changes in submaximal (6-minute walk test) and maximal (Cooper 12-minute run test) cardiorespiratory fitness, changes to sensor-derived biomarkers (e.g., sleep quality, resting heart rate, and heart rate variability), and association of sensor-derived biomarkers with EHR-confirmed clinical outcomes. ConclusionsBy utilizing autonomous, LLM-driven coaching, modular software design, and cross-platform accessibility, our smartphone application-based study will provide a scalable model for inclusive and decentralized preventive care of patients with cardiovascular disease. Trial StatusRecruitment commenced in March 2026 and is ongoing.
Adekunle, T.; Ohaeche, J.; Adekunle, T.; Adekunle, D.; Kogbe, M.
Show abstract
BackgroundArtificial intelligence is increasingly embedded in healthcare delivery. Its legitimacy depends on institutional governance, not technical performance alone. Prior research has centered on clinicians and patients. Less attention has been given to cybersecurity professionals who sustain the digital infrastructures that support health AI. This study examines how cybersecurity professionals conceptualize AI as clinical infrastructure and how these interpretations shape understandings of trust, risk, and oversight. MethodsGuided by sociotechnical systems theory and institutional trust scholarship, we conducted semi-structured in-depth interviews with twenty cybersecurity professionals working in healthcare-relevant domains. Participants were recruited through professional networks and LinkedIn outreach. Interviews were conducted between May and August 2025. They were audio-recorded and transcribed verbatim. Data were analyzed using qualitative content analysis with constant comparison. Two researchers independently coded transcripts and refined themes through iterative discussion. The study received Institutional Review Board approval. ResultsParticipants described health AI as an augmented clinical infrastructure. They emphasized that AI extends workflow capacity but requires sustained human oversight. Healthcare data systems were characterized as fragmented and vulnerable. Breaches were treated as anticipated events. Trust in AI was described as contingent and built over time through visible accountability. Cybersecurity stewardship was framed as foundational to institutional trustworthiness. ConclusionsHealth AI credibility emerges through governance practices that demonstrate accountability. Cybersecurity professionals and institutional stakeholders jointly shape trust in digitally mediated healthcare systems through governance decisions that signal accountability.
Alkeyeva, R.; Nagiyev, I.; Kim, D.; Nurmanova, B.; Omarova, Z.; Varol, H. A.; Chan, M.-Y.
Show abstract
BackgroundThe growing interest in applying artificial intelligence in personalized nutrition is challenged by the complex nature of dietary advice that must balance health, economic, and personal factors. Though automated solutions using either Linear Programming (LP) or Large Language Models (LLMs) already exist, they have significant drawbacks. LP often lacks personalization, whereas LLMs can be unreliable for precise calculations. ObjectivesTo develop and assess a model that integrates a Mixed Integer Linear Programming (MILP) solver with an LLM to generate personalized meal plans and compare it with standalone LLM and MILP models. MethodsThe proposed hybrid MILP+LLM model first uses an LLM (GPT-4o) to filter a unified food dataset (n=297), which combines regional Central Asian and global food items, according to the users profile. The filtered list of food items is then received by a MILP solver which identifies the set of top 10 optimal solutions. Finally, given this set of solutions, LLM chooses the most appropriate meal plan. The model was evaluated using five synthesized, clinically complex patient profiles sourced from Adilmetova et al. [4]. The performance of this hybrid model was compared against standalone MILP and LLM using 5-point Likert scale with Kruskal-Wallis and post hoc Dunns tests for Nutrient Accuracy, Personalization, Practicality, and Variety. ResultsFindings demonstrated that the proposed MILP+LLM model reached balanced performance achieving scores of more than 3.6 points in all criteria, with high scores in Nutrient Accuracy (3.96), Personalization (3.81), and Practicality (3.99). The standalone LLM model performed the weakest in all criteria, with statistically significant lower scores compared to the other two methods. The standalone MILP model performed best in Nutrient Accuracy (4.93) and in Variety (4.10) but lagged behind the MILP+LLM model in Practicality and Personalization. Kruskal-Wallis and Dunns tests showed MILP and MILP+LLM outperformed LLM across all criteria. MILP was more accurate (p<0.0001), while MILP+LLM model was more practical (p=0.021). ConclusionsThe findings suggest that integrating the LLM with the MILP solver creates a model that combines qualitative personalization with quantitative precision. This model produces comprehensive, reliable meal plans, addressing the limitations of using either model alone.
Li, Y.; Zhou, H.; Blackley, S.; Plasek, J. M.; Lyu, Z.; Zhang, W.; You, J.; Centi, A.; Mishuris, R.; Yang, J.; Zhou, L.
Show abstract
Ambient intelligence-based systems are increasingly used for clinical documentation. To quantify linguistic differences associated with ambient documentation, we conducted a matched pre-post analysis of 6,026 outpatient clinical notes from Mass General Brigham following implementation of two ambient AI documentation systems (Nuance Dragon Ambient eXperience [DAX] and Abridge). Within-clinician comparisons focused on the History of Present Illness (HPI) and Assessment and Plan (A&P) sections and evaluated syntactic complexity, lexical ambiguity, linguistic variability, discourse coherence, and readability. Manual review of 50 paired notes was performed to validate findings from automated linguistic analyses. Our analyses indicate that the linguistic effects of ambient documentation are both vendor-dependent and section-specific. Across both vendors, ambient notes in HPI were longer and exhibited greater syntactic complexity (longer sentences and clauses, increased dependency distance), lower lexical ambiguity, lower language-model perplexity, and higher local and global discourse coherence. These findings indicate that ambient systems systematically restructure conversational input into more syntactically elaborated and linguistically predictable narratives, reflecting increased standardization relative to both general-domain and biomedical language models. In contrast, changes in A&P were smaller and more heterogeneous, consistent with its more structured/templated nature. Readability analyses further showed increased length and lexical complexity in ambient HPI, whereas A&P readability differences were minimal. Overall, our findings demonstrate that ambient documentation changes how clinical information is linguistically expressed and organized, with effects varying by note section, vendor, and provider role/specialty. Evaluation should therefore extend beyond efficiency to consider effects on communication, cognitive load, clinical inference, and downstream analytics.
Alkhatib, S. A.; Jiwa, N.; Judd, D.; Luningham, J. M.; Sawyer-Morris, G.; Ulukaya, M.; Molfenter, T.; Taxman, F. S.; Walters, S. T.
Show abstract
Large language models (LLMs) are increasingly used for qualitative analysis in substance use research, yet their performance relative to human coders remains underexplored. This study compares ChatGPT-4.0 with human coders in identifying and describing the core innovation of NIH grants focused on reducing opioid overdose. A total of 118 NIH HEAL Initiative grant abstracts were independently coded by ChatGPT and humans to generate innovation descriptions, which were then evaluated by both human raters and ChatGPT for depth/detail and relevance/completeness using 5-point Likert scales. Identical instructions were used across all coding and evaluation stages. ChatGPT-generated descriptions were consistently rated higher than human-generated descriptions on both dimensions. Human evaluators rated ChatGPT outputs at an average of 4.47 for both depth/detail and relevance/completeness, compared to 3.33 and 3.24 for human outputs, respectively (F(1,176)=133.9, p<0.001). These findings suggest that LLMs, when carefully prompted, can enhance the efficiency and quality of qualitative research evaluation.
Jansen, C.-P.; Braun, J.; Alvarez, P.; Berge, M. A.; Blain, H.; Buekers, J.; Caulfield, B.; Cereatti, A.; Del Din, S.; Garcia-Aymerich, J.; Helbostad, J. L.; Klenk, J.; Koch, S.; Murauer, E.; Polhemus, A.; Rochester, L.; Vereijken, B.; Puhan, M. A.; Becker, C.; Frei, A.
Show abstract
Background Older adults' walking has so far been evaluated using standardised assessments of walking capacity within a clinical setting. By taking the evaluation out of the laboratory into the real world, this study provides first evidence of the ability of Digital Mobility Outcomes (DMOs) to detect changes over time and the Minimal Important Difference (MID) in patients after proximal femoral fracture (PFF). This will guide the implementation of DMOs in research and clinical care. Methods For this multicenter prospective cohort study, 381 community-dwelling older adults were included within one year after sustaining a PFF and assessed at two time points, separated by six months. Walking activity and gait DMOs were measured using a single wearable device worn on the lower back for up to seven days. A global impression of change question and three mobility-related outcome measures (Late-Life Function and Disability Instrument; Short Physical Performance Battery; 4m gait speed) were used as anchor variables. To assess each DMOs ability to detect changes, we calculated the standardized mean change as effect size. For estimating MIDs, both distribution-based and anchor-based methods were applied, followed by triangulation by experts if at least three anchor-based estimates were available per DMO, resulting in single-point estimates. Results All three anchor variables demonstrated substantial changes. Overall, 10 out of 24 available DMOs showed large and 7 DMOs moderate positive effects in the expected direction of the respective anchors. Seven DMOs showed no or only small effects. For 12 DMOs, at least three anchor-based estimates were available, enabling MID triangulation. MIDs for walking activity DMOs per day were: a walking duration of 10 minutes, a step count of 1,000 steps, 50 walking bouts (WB), and 15 WBs in WBs over 10 seconds. For gait DMOs, depending on the walking bout length, MIDs for walking speed were between 0.04 m/s and 0.08 m/s, and MIDs for cadence between 4 and 6 steps/minute. Almost all DMOs showed a strong ability to detect improvement in mobility, but rarely in detecting decline. Conclusions For the first time, MIDs are presented for real-world DMOs in PFF patients. These MIDs inform sample size requirements and interpretation of intervention effects for clinical trials, thereby providing guidance and reassurance for clinicians and regulatory bodies.
Barraclough, J. Y.; Ouyang, M.; Reading, M.; Woodward, M.; Rodgers, A.; Peiris, D.; Patel, A.; Neal, B.; Arnott, C.; Liu, H.
Show abstract
AimTo outline the opportunities and barriers when using hairdressing salons as a novel site for enhancing cardiovascular risk factor assessment and management in women. MethodsA process evaluation nested within a cluster-randomised trial, Hairdressers for Health. The trial evaluated a nudge intervention advising women [≥]45years attending hairdressing salons to undertake a Heart Health Check with their General Practitioner. The UK Medical Research Council process evaluation framework was used to guide the design, data collection and analysis. Nineteen interviews were conducted with nine hairdressers, nine study participants and a project officer. Thematic analysis assessed recruitment, reach, acceptability, and adoption. Characteristics of the salons and participants were analysed using descriptive statistics. ResultsRecruitment of the planned 88 metropolitan and 28 regional salons for the trial was challenging, requiring resource-intensive face-to-face visits. The nudge intervention was well accepted by participants, and salons were perceived to be an appropriate setting to effectively reach women. Adoption of the study by salons was limited with only 54 of the 116 salons recruiting participants (total recruited 239, range 1-22 participants per salon). Barriers to participant recruitment included technological constraints while using a decentralised online recruitment and data collection platform, client preferences and privacy concerns. Established hairdresser-client relationships in smaller salons facilitated greater client participation and was perceived as a good mechanism for health promotion. ConclusionsCardiovascular health prevention messaging for women in salons was acceptable to hairdressers and clients. Designing the study to make better use of hairdresser-client personal relationships may have improved project implementation. Trial RegistrationACTRN12621001740886
Patel, P.; Brown, S.; Markham, A.; Beckenstrom, A.; Friedemann, M.; Kingslake, J.; Highfield, J.; Summers, C.; Holmes, E. A.; Morriss, R.
Show abstract
Structured AbstractO_ST_ABSObjectiveC_ST_ABSThis mixed-methods study investigated the lived-experience perspectives of receiving a novel, brief digital mental health intervention after psychological trauma. The online gamified imagery-competing task intervention (ICTI) involves one researcher-guided session followed by self-use. Tested in two randomised controlled trials (GAINS-01; GAINS-02), ICTI led to fewer intrusive memories at week-4, with the reduction sustained over 24 weeks, alongside reductions in post-traumatic stress. Here, we contrasted user experiences of ICTI with an Active Control (AC; music-listening task), and explored longer-term impact in qualitative interviews to contextualise GAINS-02 findings. Methods and AnalysisThe GAINS-02 trial randomised healthcare staff experiencing intrusive memories after work-related trauma to ICTI (N=40), AC (N=39), or treatment-as-usual (TAU; N=20). Expectancy was assessed before the researcher-guided session (Day 0), acceptability at week-4, and usage tracked for 24-weeks. Semi-structured interviews (N=27) were conducted in ICTI and AC arms only (15 at week-4; 12 during 12-24-weeks). Interviews were analysed using reflexive thematic analysis. ResultsPrior to use, many trial participants did not think the intervention would work, favouring AC over ICTI. However, after completing the tasks, participants found ICTI more acceptable and relevant to intrusive memories than AC. After the one guided session, median ICTI usage the next four weeks was 4.00 times with little additional use (once more) over the next 20 weeks because of lack of need. Potential implementation facilitators included ease of use, and advantages over existing interventions due to not needing to talk about the trauma, brevity, and lesser resource commitment. Perceived barriers included a lack of staff and manager education about the nature and consequences of intrusive memories, with a need for workplace buy-in and demonstration of organisational benefits. ConclusionHealthcare staff experiencing workplace-related trauma found ICTI to be acceptable and effective for reducing intrusive memories with low effort and emotional burden, even among participants who initially expressed scepticism. Participants highlighted implementation considerations including offering ICTI both within and outside the workplace, and providing a self-guided version of ICTI with optional support. Future work should assess cost-effectiveness, impacts on presenteeism and retention, and real-world implementation including the feasibility and effectiveness of a self-guided ICTI. Summary BoxO_ST_ABSWhat is already known on this topicC_ST_ABSIn a previous randomised controlled trial (GAINS-01) with Intensive Care Unit (ICU) staff exposed to work-related trauma, a brief online gamified imagery-competing task intervention (ICTI) reduced intrusive memories compared to usual care at four-weeks. What this study addsThe GAINS-02 randomised controlled trial replicated GAINS-01 and extended results by comparing ICTI to an active control (AC; music listening) task, enrolling hospital staff from outside ICU, and a follow-up period of 24-weeks. Qualitative interviews found that, despite initial scepticism from healthcare staff prior to using the intervention, ICTI was more acceptable than an AC due to specific effects on swiftly reducing intrusive memories and requiring minimal support or usage after an initial researcher-guided session. After one guided session, ICTI was used 4 more times in the first four weeks, with little additional usage (once) thereafter because of lack of need (i.e., no longer experiencing intrusive memories). How this study might affect research, practice or policyICTI is an efficacious scalable intervention to relieve staff of intrusive memories with effects sustained for at least 6-months. It was found to be more acceptable to participants than alternatives, requiring less time commitment than standard psychological treatments.